Urdu Dependency Parser: A Data-Driven approach

نویسندگان

  • Wajid Ali
  • Sarmad Hussain
چکیده

In this paper, we present what we believe to be the first data-driven dependency parser for Urdu. The parser was trained and tuned using MaltParser system, a system for data-driven dependency parsing. The Urdu dependency treebank (UDT) is used for training and testing of the Urdu dependency parser, is also presented first time. The UDT contains corpus of 2853 sentences which are annotated at multiple levels such as part-of-speech (POS) level, chunk (phrase level) and dependency relations level. The UDT also contains information about the token counter, head of current token. The annotation is done manually to build UDT. Urdu Dependency Parsing system is evaluated by conducting a series of experiments. All experiments are performed using Maltparser default algorithm with different feature models. Initial, a base line simple feature model consisting word position, word, head and dependency relation is used for Urdu dependency parsing. Then feature model is enhanced by adding part-of-speech (POS) and chunk (Phrase level) information. The results of all parsing experiments are reported. The overall best labeled accuracy (LA) achieved 74.48% and 90.14% of unlabeled attachment score (UAS) is achieved. The error analysis is performed by comparing output data with treebank test data which manual parsed to analyze and classify the different types of errors produced by the parser. This is very useful to identify the future directions for future expansion of the treebank and for improving the parsing accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Data-Driven Dependency Parser for Urdu

One of the main motivations for building treebanks is that they facilitate the development of syntactic parsers, by providing realistic data for evaluation as well as inductive learning. In this paper we present what we believe to be the first data-driven dependency parser for Urdu, which has developed using MaltParser system and trained and evaluated on data from Urdu dependency treebank. A 40...

متن کامل

Feature Engineering in Persian Dependency Parser

Dependency parser is one of the most important fundamental tools in the natural language processing, which extracts structure of sentences and determines the relations between words based on the dependency grammar. The dependency parser is proper for free order languages, such as Persian. In this paper, data-driven dependency parser has been developed with the help of phrase-structure parser fo...

متن کامل

Exploiting Language Variants Via Grammar Parsing Having Morphologically Rich Information

In this paper, the development and evaluation of the Urdu parser is presented along with the comparison of existing resources for the language variants Urdu/Hindi. This parser was given a linguistically rich grammar extracted from a treebank. This context free grammar with sufficient encoded information is comparable with the state of the art parsing requirements for morphologically rich and cl...

متن کامل

Building Computational Resources: The URDU.KON-TB Treebank and the Urdu Parser

This work presents the development of the URDU.KON-TB treebank, its annotation evaluation & guidelines and the construction of the Urdu parser for a South Asian language Urdu. Urdu is comparatively an under-resourced language and the development of a reliable treebank and a parser will have significant impact on the state-of-the-art for automatic Urdu language processing. The work includes the ...

متن کامل

Morphologically rich Urdu grammar parsing using Earley algorithm

This work presents the development and evaluation of an extended Urdu parser. It further focuses on issues related to this parser and describes the changes made in the Earley algorithm to get accurate and relevant results from the Urdu parser. The parser makes use of a morphologically rich context free grammar extracted from a linguistically-rich Urdu treebank. This grammar with sufficient enco...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010